Semaphore soft and hard hybrid architecture

ABSTRACT

A packet processing device has a plurality of processing stages, including a first processing stage and a second processing stage arranged as a packet processing pipeline. The first processing stage and the second processing stage each have a respective processor configured to process a packet of a packet stream and a respective resource manager having a respective local resource lock corresponding to a remote resource. The respective processor requests the respective resource manager to allocate the remote resource. The respective resource manager responds to the request to allocate the remote resource by locking the remote resource with the respective local resource lock and allocating the remote resource. The respective processor implements a packet processing operation associated with the allocated remote resource.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to and the benefit of U.S. ProvisionalPatent Application No. 61/694,483 filed Aug. 29, 2012, and U.S.Provisional Patent Application No. 61/753,767 filed Jan. 17, 2013, thedisclosures of both of which are incorporated by reference herein intheir entirety. This application is related in content to U.S. patentapplication Ser. No. 13/891,707 for “Hybrid Dataflow Processor,” filedMay 10, 2013, the entire disclosure of which is incorporated byreference herein in its entirety.

BACKGROUND

The present disclosure relates to a network device that processespackets.

The background description provided herein is for the purpose ofgenerally presenting the context of the disclosure. Work of thepresently named inventor, to the extent it is described in thisbackground section, as well as aspects of the description that may nototherwise qualify as prior art at the time of filing, are neitherexpressly nor impliedly admitted as prior art against the presentdisclosure.

A pipeline-type packet processing device typically includes processingstages arranged to process packets in what is understood as a pipelineconfiguration. In a pipeline configuration, packets are processed instages with packets progressing from stage to stage through thepipeline. In some packet processing pipelines, such as a dataflow packetprocessing pipeline, respective stages are programmable. A stage in thepacket processing pipeline receives a packet that had been processed atone or more previous stages, along with a processing context for thepacket. Various resources, such as for example various table lookups,that are to be used by the respective stages in a packet processingpipeline typically are provided by units that are external to the packetprocessing pipeline and that are accessed by the stages when thoseservices are needed to perform a processing operation.

SUMMARY

One or more embodiments of the disclosure generally relate to a networkdevice that processes packets through a number of processing stages.Some packet processing operations update, i.e., read and write a valuein a remote resource such as a memory location in a table. Thissituation is also referred to as “read-modify-write.” When theprocessing of a packet across multiple stages involves updating a remoteresource, the possibility of contention for the remote resource, frompacket processing operations implemented for prior or succeedingpackets, is mitigated. According to example embodiments, when theprocessing of a given packet involves packet processing operations thatwill read from and subsequently write to a given remote resource, theremote resource is allocated or “locked” so that the processing of otherpackets cannot interfere with the remote resource until the processingcarried out for the given packet reaches a point at which the remoteresource is suitably released. Semaphores are used in exampleembodiments to lock corresponding remote resources. Notwithstanding theforegoing, not every example embodiment is required to possess all oreven any of the features mentioned in this paragraph.

According to an aspect of the present disclosure, there is provided apacket processing device having processing stages, including a firstprocessing stage and a second processing stage arranged as a packetprocessing pipeline; the first processing stage and the secondprocessing stage each have: a respective processor configured to processa packet of a packet stream, and a respective resource manager having arespective local resource lock corresponding to a remote resource; therespective processor is configured to request the respective resourcemanager to allocate the remote resource; the respective resource manageris further configured to respond to the request to allocate the remoteresource by locking the remote resource with the respective localresource lock and allocating the remote resource; the respectiveprocessor is further configured to implement a packet processingoperation associated with the allocated remote resource.

According to another example embodiment, a packet processing deviceincludes: processing stages arranged as a packet processing pipeline;the processing stages each having processor cores and buffers; theprocessor cores and buffers of the processing stages defining aplurality of paths, for simultaneous packet processing, through thepacket processing pipeline; an ingress front end configured to directeach packet of an incoming stream of packets into one of the pluralityof paths; the paths including a hard path and a soft path, the hard pathbeing configured to process received ones of the incoming stream ofpackets with fixed latency, the soft path being configured to processreceived ones of the incoming stream of packets with variable latency;and the processing stages each further including a respective resourcemanager configured to request allocation of a remote resource, for agiven packet of the incoming stream of packets, in response to aninstruction from one of the processor cores. According to anotherexample embodiment, the respective resource manager is furtherconfigured to allocate an available remote resource, thereby making theremote resource accessible only to the respective packet and tosubsequent packet processing operations for that packet. Also, theprocessing stages are further configured, in an example embodiment, toboth request the release of a allocated resource, and to receive fromanother processing stage a request to release an allocated resource andsubsequently to release the resource. A remote resource which has beenreleased is available to be allocated.

In yet another example embodiment, a packet processing method includesreceiving, at a processor of a first processing stage, a first packetand a request for allocation of a remote resource; responding, by theprocessor, to the allocation request, by setting a semaphorecorresponding to the remote resource to indicate a locked status;implementing a first packet processing operation, in association withthe allocated remote resource, and in association with the first packet,to obtain a processed first packet; and outputting the processed firstpacket to a next processing stage of the pipeline packet processingdevice.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a highly simplified illustrative drawing showing a concept ofoperation according to example embodiments.

FIG. 1B is a highly simplified illustrative drawing showing anotherconcept of operation according to example embodiments.

FIG. 2 is a highly simplified schematic diagram of a device according toexample embodiments.

FIG. 3 is a highly simplified schematic diagram showing a particularaspect of a device according to example embodiments.

FIG. 4 is a flow diagram of an example method according to exampleembodiments.

FIG. 5 is a flow diagram of an example method according to exampleembodiments.

FIG. 6 is a flow diagram of an example method according to exampleembodiments.

FIG. 7 is a highly simplified schematic diagram of a device according toexample embodiments.

FIG. 8 is a highly simplified schematic diagram of a device according toexample embodiments.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

In the following discussion, descriptions of well-known functions andconstructions may be omitted for increased clarity and conciseness.

FIG. 1A shows a Packet Processing Device (1000). Packet ProcessingDevice (1000) includes a Plurality of Processing Stages (60), includinga First Processing Stage (60-0), a Second Processing Stage (60-1), andother processing stages in accordance with particular exampleembodiments. Here, the terms “first” and “second” do not connote thatthe First Processing Stage (60-0) is limited to being the firstprocessing stage in a pipeline configuration, nor that the SecondProcessing Stage (60-1) is limited to being the second processing stagein the pipeline configuration. Instead, “first” and “second” simplyindicate a relative order in the pipeline configuration such that theFirst Processing Stage (60-0) is relatively earlier along the pipelineconfiguration than is the Second Processing Stage (60-1). In thisdiscussion and in the drawing figures, the First Processing Stage (60-0)and Second Processing Stage (60-1) are illustrated adjacent each otherin the pipeline configuration, but this arrangement is merely an exampleembodiment, and other processing stages are interposed between the FirstProcessing Stage (60-0) and the Second Processing Stage (60-1) inexample embodiments. Further, the First Processing Stage (60-0) isconsidered to be “upstream” with respect to the Second Processing Stage(60-1). Likewise, the Second Processing Stage (60-1) is considered to be“downstream” with respect to the first processing stage, according to anexample embodiment. In an embodiment the packet processing device (1000)is configured as a dataflow pipeline in which the respective stages (60)are programmable, and in which packets are received for processing at adownstream processing stage along with a processing context forprocessing the corresponding received packet. In an alternative exampleembodiment, one or more individual stages of the Plurality of ProcessingStages (60) are not programmable but have hard-coded functionality. Inyet another example embodiment, the Plurality of Processing Stages (60)includes some programmable processing stages and some hard-codedprocessing stages.

Each of the Plurality of Processing Stages (60) is substantially thesame from a hardware perspective, according to an example embodiment. Inanother example embodiment, the Plurality of Processing Stages (60)includes different types of processing stages. The discussion belowassumes that each of the Plurality of Processing Stages (60) has thesame hardware structure. The components of each processing stage aredescribed in more detail below. At the present, however, it is notedthat each of the Plurality of Processing Stages (60) includes arespective Pool of Resource Locks (70). The respective Pool of ResourceLocks (70) for the First Processing Stage (60-0) is shown as Pool ofResource Locks (70-0). Likewise, the respective Pool of Resource Locks(70) for the Second Processing Stage (60-1) is shown as Pool of ResourceLocks (70-1). Each of the Plurality of Processing Stages (60) has arespective Pool of Resource Locks (70), according to exampleembodiments.

Within each respective Pool of Resource Locks (70) there exists aplurality of semaphores. In the present example embodiment, thesemaphores are represented as Local Resource Locks (50, 51, . . . 5 n).The number of semaphores used depends on particular implementationdetails, and FIG. 1 shows only a few semaphores within a Pool ofResource Locks (70) for the sake of simplicity of illustration. Eachrespective Pool of Resource Locks (70) includes its own Local ResourceLocks (50, 51, . . . 5 n). The Pool of Resource Locks (70-0) for FirstProcessing Stage (60-0) includes Respective Local Resource Lock (50-0),Respective Local Resource Lock (51-0), . . . and Respective LocalResource Lock (5 n-0). Likewise, the Pool of Resource Locks (70-1) forSecond Processing Stage (60-1) includes Respective Local Resource Lock(50-1), Respective Local Resource Lock (51-1), . . . and RespectiveLocal Resource Lock (5 n-1).

Also shown in FIG. 1A are a Plurality of Remote Resources (85) includingRemote Resource (85-0), Remote Resource (85-1), . . . and RemoteResource (85-n). Together, these represent remote resources that areavailable for READ and WRITE operations carried out by the Plurality ofProcessing Stages (60). In an example embodiment, the remote resourcesare memory locations that contain data such as, in a system forprocessing packets received on a network, forwarding tables, policytables, MAC address learning, time-out management, and the like. Thisexample is non-limiting, however, and the remote resources according toother exemplary embodiments contain data in structures other thantables. In an example embodiment, a remote resource contains a valuewhich is accessed by either a READ or a WRITE Operation. In a READOperation, a Processing Stage obtains a value stored in a memorylocation of the remote resource. In a WRITE Operation, the ProcessingStage stores a value into the memory location of the remote resource. Itis to be noted, however, that a remote resource may be locked by aprocess intended to subsequently perform alternate operations inaddition to READ and WRITE operations, for example updating orsynchronizing multiple tables and/or memory locations.

Between the Plurality of Processing Stages (60) and the Plurality ofRemote Resources (85) is illustrated a Plurality of Engines (80). In anexample embodiment, a Processing Stage requires an Engine to access aRemote Resource. In other example embodiments, however, the Plurality ofProcessing Stages (60) are able to access the Plurality of RemoteResources (85) directly or without the use of an Engine. According toanother example embodiment, a given Remote Resource is accessed througha combination of engines.

At the lower portion of FIG. 1A, there is illustrated a more detailedview of the Pool of Resource Locks (70-0) of the First Processing Stage(60-0), according to an example embodiment. This more detailedrepresentation of the Pool of Resource Locks (70-0) is shown as a tablecontaining two columns. Each row of the table represents a LocalResource Lock such as Respective Local Resource Lock (50-0), RespectiveLocal Resource Lock (51-0), . . . Respective Local Resource Lock (5n-0). The left column of the table includes, in each row, an identifierof a corresponding one of the Plurality of Remote Resources (85). Theright-column of the table indicates the status of the correspondingRemote Resource. In this non-limiting, simplified example the status isUNLOCKED or LOCKED. The table of FIG. 1A indicates that, in thisexample, the Remote Resource (85-0) is UNLOCKED, the Remote Resource(85-1) is LOCKED, and the Remote Resource (85-n) is UNLOCKED.

In the example embodiment shown in FIG. 1A, the Processing Stages areconnected to a Resource Release Bus (300), also referred to as a lockrelease message path. The Resource Release Bus (300) allows forcommunication, such as message exchange, among the Plurality ofProcessing Stages (60). According to an example embodiment, the ResourceRelease Bus (300) allows for messages to be passed from upstream stagesto downstream stages. The Resource Release Bus (300), however, isimplemented in other ways according to example embodiments, and a busthat permits message exchange is a non-limiting example. In the presentexample embodiment, the Resource Release Bus (300) does not carry thepackets being processed through the pipeline configuration.

A single Packet (20-0) is shown in FIG. 1A, for simplicity ofillustration. The Packet (20-0) enters the First Processing Stage(60-0), and will pass to the Second Processing Stage (60-1) after anypacket processing operations programmed into the First Processing Stage(60-0) are concluded.

FIG. 1A also illustrates part of an overall concept of operationaccording to an example embodiment. In particular, 1a-1 indicates theentering of Packet (20-0) into the First Processing Stage (60-0). Atthis instant, all locks of the Pool of Resource Locks (70-0) have anUNLOCKED status. The packet processing operations programmed into theFirst Processing Stage (60-0) include, for example, a READ of RemoteResource (85-1). Before carrying out the READ operation, however, thepacket processing operations programmed into First Processing Stage(60-0) carry out a check to determine whether Remote Resource (85-1) isin a LOCKED or in an UNLOCKED status. To make this determination thePool of Resource Locks (70-0) is checked. More particularly, theparticular Respective Local Resource Lock (51-0) corresponding to RemoteResource (85-1) is checked. At 1a-2a, Finding the Respective LocalResource Lock (51-0) has an UNLOCKED status, the First Processing Stage(60-0) sets the Respective Local Resource Lock (51-0) to a LOCKEDstatus. At 1a-2b, having locked Respective Local Resource Lock (51-0),the First Processing Stage (60-0) carries out the READ operation withrespect to Remote Resource (85-1) using an appropriate one of thePlurality of Engines (80). The value of Remote Resource (85-1) is thusread by the First Processing Stage (60-0) and this value is nowavailable to use in accordance with the processing programmed into FirstProcessing Stage (60-0) and others of the Plurality of Processing Stages(60). At 1a-3, the Packet (20-0) is transferred to Second ProcessingStage (60-1), together with other information to be described later.

In this example embodiment, the setting of the status of RespectiveLocal Resource Lock (51-0), corresponding to Remote Resource (85-1),from UNLOCKED to LOCKED, implements a locking of the remote resource. Toput it another way, the Respective Local Resource Lock (51-0) acts as asemaphore that indicates that Remote Resource (85-1) is an allocatedremote resource.

It is noted that, in an embodiment the mechanism for locking RemoteResource (85-1) and the mechanism for subsequent reading of the value ofRemote Resource (85-1) are both located and activated locally within theFirst Processing Stage (60-0). If, for example, in contrast, semaphoreswere to be maintained in a shared pool of semaphores located outside theFirst Processing Stage (60-0), delays in making the determination as tothe LOCKED or UNLOCKED status of Remote Resource (85-1) would likelyresult. Although maintaining a shared pool of semaphores (not shown)streamlines some aspects of the processor architecture, in an exampleembodiment, one of the Plurality of Engines (80) would need to obtainLOCKED or UNLOCKED status information from such a shared pool ofsemaphores.

Since the locking of Remote Resource (85-1) and the subsequent readingof the value of Remote Resource (85-1) take place within the FirstProcessing Stage (60-0), within a very short time of each other, it isnoted that the lock and read operations constitute a Lock & ReadOperation (105). It is further noted that the Lock & Read Operation(105) allocates the Remote Resource (85-1) by setting a value ofRespective Local Resource Lock (51-0), thereby preventing FirstProcessing Stage (60-0) from carrying out any operations with respect toRemote Resource (85-1) until the status of Remote Resource (85-1) inRespective Local Resource Lock (51-0) goes from LOCKED to UNLOCKED.

It is noted that, in First Processing Stage (60-0), the Lock & ReadOperation (105) includes a READ operation with respect to the allocatedRemote Resource (85-1). This READ operation is understood to constitutea packet processing operation associated with the allocated RemoteResource (85-1).

Having thus described an overall concept of operation for the Lock &Read Operation (105), the discussion will proceed to a further conceptof operation relating to a Write & Release Operation (106).

At 1a-3 of FIG. 1A, Packet (20-0) completed processing at and exits fromthe First Processing Stage (60-0). This packet, prior to undergoing theprocessing programmed into First Processing Stage (60-0), is understoodto constitute a first packet. At 1a-3, however, this packet isunderstood to constitute a processed first packet. In somecircumstances, depending on the processing programmed into the givenprocessing stage, the Packet (20-0) changes. Such changes, however, arenot strictly required, and in some situations the Packet (20-0) staysthe same. Therefore, it is noted that the processing programmed intoFirst Processing Stage (60-0), performed with respect to the firstpacket, results, in any event, with the passing along of a processedfirst packet.

Turning to FIG. 1B, Packet (20-1) represents a second packet. At 1b-1,the Packet (20-1) is passed to Second Processing Stage (60-1). In thisexample, the Packet (20-1) is a packet which, in a prior processingstage, resulted in the locking of a remote resource such as RemoteResource (85-1). For the sake of this discussion, Packet (20-1) is theprocessed first packet which exits from First Processing Stage (60-0) at1a-3.

In FIG. 1B, at 1b-2a, the processing programmed into Second ProcessingStage (60-1) causes a WRITE operation to be carried out with respect toRemote Resource (85-1). Having concluded the WRITE operation, it issuitable to release the Remote Resource (85-1). At 1b-2b, to effect therelease of Remote Resource (85-1), Second Processing Stage (60-1) sendsa Resource Lock Release Request over Resource Release Bus (300). Asshown in FIG. 1B, the Resource Lock Release Request includes two itemsof information. One item of information in the Resource Lock ReleaseRequest is an identifier of the one of the Plurality of ProcessingStages (60) that has the Respective Local Resource Lock (50-1)corresponding to the locked Remote Resource (85-1). The other item ofinformation is an identifier of the particular local resource lock(here, Respective Local Resource Lock (50-1)) to be released.

Since the Resource Release Bus (300) needs to carry the Resource LockRelease Request only to upstream ones of the Plurality of ProcessingStages (60), the Resource Release Bus (300) communicates in onedirection, e.g. the upstream direction, in an example embodiment.

At 1b-3a, the packet Packet (20-1) exits from the Second ProcessingStage (60-1) as a processed second packet and enters into a subsequentone of the Plurality of Processing Stages (60).

At 1b-3b, the First Processing Stage (60-0) receives the Resource LockRelease Request addressed to the First Processing Stage (60-0), andchanges the status of Remote Resource (85-1) in Respective LocalResource Lock (51-0) of the Pool of Resource Locks (70-0) by replacingLOCKED with UNLOCKED (illustrated as “LOCKED->UNLOCKED” in FIG. 1B).After the status of Remote Resource (85-1) is updated in RespectiveLocal Resource Lock (51-0), the Remote Resource (85-1) is no longerlocked, and is available to be allocated for use in a packet processingoperation for another packet.

Since the WRITE operation and the subsequent sending of the ResourceLock Release Request take place within the Second Processing Stage(60-1), within a very short time of each other, it is noted that thesetwo actions constitute a Write & Release Operation (106). It is furtherunderstood that the Write & Release Operation (106) releases, ordeallocates the Remote Resource (85-1) by causing the First ProcessingStage (60-0) to set a value of its Respective Local Resource Lock(51-0), thereby enabling First Processing Stage (60-0) to carry outoperations with respect to Remote Resource (85-1) if necessary.

In an example embodiment the Resource Lock Release Request is understoodto constitute, more generally, a release request message.

In an example embodiment, the status of one of the Plurality of RemoteResources (85) is represented in an alternative manner, such as a singlebinary digit with one value representing a LOCKED status, and the othervalue representing an UNLOCKED status. Other implementations of the Poolof Resource Locks (70-0) are within the ability of a person familiarwith this field. Further, the implementation of semaphores may besubstituted with other locking mechanisms familiar to those skilled inthe art including a single client lock such as a mutex or the like.

According to an example embodiment, therefore, local locks are used torestrict the accessibility of a remote resource to the processing for asingle packet, along the respective processing stages. According to anexample embodiment, the First Processing Stage (60-0) reads from theRemote Resource (85-1) and passes the Packet (20-0) to the SecondProcessing Stage (60-1) as the Packet (20-1). The Second ProcessingStage (60-1) then writes a value to the Remote Resource (85-1). Thesemaphore (Respective Local Resource Lock (50-1)) allows exclusiveaccess by the Second Processing Stage (60-1) for the purpose of writing.However, without such a semaphore, the Second Processing Stage (60-1)would not be guaranteed exclusive access to the Remote Resource (85-1).That is to say, the First Processing Stage (60-0) would have theopportunity, in response to a new packet requiring the same RemoteResource (85-1), to begin reading from the Remote Resource (85-1) whilethe Second Processing Stage (60-1) is beginning to write to the RemoteResource (85-1), resulting in possible problems due to such contention.The use of semaphores, in Packet Processing Device (1000), thus avoidssuch contention problems.

FIG. 2 shows a more detailed view of the Packet Processing Device(1200). In FIG. 2, the Plurality of Processing Stages (60) includesFirst Processing Stage (60-0), Second Processing Stage (60-1), otherprocessing stages, and Final Processing Stage (60-f).

The Plurality of Engines (80) includes Engine (80-0), Engine (80-1),other engines, and Engine (80-n). These engines, according to an exampleembodiment, are input/output (I/O) processors that can interface withappropriate ones of the Plurality of Remote Resources (85) on behalf ofthe Plurality of Processing Stages (60). The Plurality of ProcessingStages (60) communicate with the Plurality of Engines (80) throughcommunication paths that are understood to constitute Engine Connections(100-0, -1 . . . -f). The Plurality of Engines (80) communicate with thePlurality of Remote Resources (85) through communication paths that areunderstood to constitute Engine to Resource Connections (101-0, -1, . .. , -f). The engine connections and the engine to resource connectionsare implemented, in an example embodiment, by an interconnect or othersuitable connections.

Each of the Plurality of Processing Stages (60) is substantially similarfrom a hardware point of view. Taking the First Processing Stage (60-0)as an example, the processing stage includes one or more RespectiveProcessors (30-0, 30-1, . . . 3 n-0). Each processor is configured toaccept a packet such as Packet (20-0) from a Plurality of ParallelPacket Streams (20).

The First Processing Stage (60-0) further includes a Respective ResourceManager (90-0) having Buffers (40-0, 41-0, . . . 4 n-0). The RespectiveResource Manager (90-0) of the First Processing Stage (60-0) furtherincludes the Pool of Resource Locks (70-0) and the Respective LocalResource Locks (50-0, 51-0, . . . 5 n-0).

The First Processing Stage (60-0) is configured to receive the Packet(20-0) of a Plurality of Parallel Packet Streams (20). Although thepackets of the Plurality of Parallel Packet Streams (20) are shownentering First Processing Stage (60-0) in a parallel fashion, the actualcircuitry over which packets travel need not actually be implemented insuch a manner. In FIG. 2, there is shown a Packet Bus (200) along whicheach packet travels, according to an example embodiment.

As previously mentioned in the context of FIGS. 1 a, 1 b, and furtherbelow in FIG. 3 which will be discussed momentarily, each of thePlurality of Processing Stages (60) is configured to implement a Lock &Read Operation (105), to implement a Write & Release Operation (106), toaccept a Resource Lock Release Request received over the ResourceRelease Bus (300), and, subsequently, to release the indicated ResourceLock. According to an example embodiment, the First Processing Stage(60-0) is configured, in accordance with the processing programmed thestage, and in response to the entrance of Packet (20-0), to allocate theRemote Resource (85-0) by locking Respective Local Resource Lock (51-0)of the Pool of Resource Locks (70-0), i.e., the Lock Operation.

The First Processing Stage (60-0) is further configured to perform theRead Operation, through its Respective Resource Manager to EngineConnection (100-0), through Engine (80-0), and via Engine to ResourceConnection (101-0) to Remote Resource (85-0). The First Processing Stage(60-0) is yet further configured to pass Packet (20-0) along the PacketBus (200) as Packet (20-1).

The Second Processing Stage (60-1) is configured, in response toreceiving Packet (20-1), to perform the Write Operation, throughRespective Resource Manager to Engine Connection (100-1), Engine (80-1),to Engine to Resource Connection (101-0), and to Remote Resource (85-0).The Second Processing Stage (60-1) is further configured to pass thePacket (20-1) along the Packet Bus (200) as Packet (20-2). The SecondProcessing Stage (60-1) is yet further configured to request the releaseof Remote Resource (85-0) by causing the unlocking of the RespectiveLocal Resource Lock (51-0), i.e., the Release Operation. To cause theunlocking, the Second Processing Stage (60-1) sends, along ResourceRelease Bus (300), a release request containing identifiers indicatingthe First Processing Stage (60-0) and its particular Respective LocalResource Lock (51-0). The First Processing Stage (60-0) is configured toreceive the release request and to subsequently unlock the RespectiveLocal Resource Lock (51-0), thereby deallocating the Remote Resource(85-0). The Remote Resource (85-0) is then available for subsequentallocation.

FIG. 3 illustrates an example embodiment of a Packet Processing Device(1300). The Packet Processing Device (1300) includes a Receiver (110)for receiving packets, an Interface Arbiter (120) for directing packetsalong suitable paths through the pipeline configuration, a FirstProcessing Stage (60-0) of a Plurality of Processing Stages (60), aPlurality of Engines (80), and a Remote Resource (85-0) of a Pluralityof Remote Resources (85). The Receiver (110) receives a Packet (20-0)and subsequently passes the Packet (20-0) to the Interface Arbiter(120). The Interface Arbiter (120) directs the Packet (20-0) into theFirst Processing Stage (60-0). The Interface Arbiter (120) is alsoreferred to as in ingress front end, in example embodiments.

The First Processing Stage (60-0) includes a Respective Processor(30-0). The Respective Processor (30-0) includes a Packet Memory (160),an Execution Memory (150), an Instruction Memory (130), and a ProcessorCore (140). The Respective Processor (30-0) is configured to perform anoperation consistent with the operations already described with respectto FIG. 1A and FIG. 1B, namely, a Lock & Read Operation (105) and aWrite & Release Operation (106).

The Packet Memory (160) stores a packet from the Plurality of ParallelPacket Streams (20). The Execution Memory (150) stores informationrelated to the processing of the packet such as, for example, variablevalues and an indication as to whether one or more of the Plurality ofRemote Resources (85) is allocated for the processing of the packet.When a packet exits from the First Processing Stage (60-0) to the SecondProcessing Stage (60-1), the contents of both the Packet Memory (160)and the Execution Memory (150) are passed along, according to an exampleembodiment, as an Execution Context.

The Processor Core (140) carries out operations in accordance with thecontents of the Instruction Memory (130). The particular operationscarried out depend on the type of packet being processed. In exampleembodiments the Processor Core (140) has been discussed in the contextof programmable operations; it is noted that Processor Core (140) may beimplemented multiple manners, such as a programmable processor core oras a hardware-designed processor core.

As shown in FIG. 3, multiple instances of Respective Processor (30-0)are provided in another example embodiment. The presence of multipleinstances of Respective Processor (30-0) make it possible to carry out anumber of operations within the First Processing Stage (60-0) before thepacket is provided to the Second Processing Stage (60-1). According toan example embodiment, the multiple instances of Respective Processor(30-0) carry out a packet processing operation associated with a singlepacket. According to another example embodiment, the multiple instancesof Respective Processor (30-0) carry out a packet processing operationassociated with multiple packets. According to yet another exampleembodiment, packet processing operations are performed on packets indifferent streams, depending on the particular processing context.

The First Processing Stage (60-0) further includes a Respective ResourceManager (90-0) which includes a Buffer (40), a Pool of Resource Locks(70-0), and an I/O Access Unit (190). The Buffer (40) stores the PacketMemory (160) and the Execution Memory (150) for packets whose processinginvolves access to any of the Plurality of Remote Resources (85).

The I/O Access Unit (190) further includes a Driver Table (180) utilizedin the accessing of the Plurality of Engines (80) and, in some exampleembodiments, the Remote Resource (85-0).

The First Processing Stage (60-0) communicates with the Plurality ofEngines (80) by a Respective Resource Manager to Engine Connection(100-0), which then communicates with the Remote Resource (85-0) by anEngine to Resource Connection (105-0).

A more detailed architecture that is suitable for implementing one ofthe Plurality of Processing Stages (60), according to an exampleembodiment, is described in Jakob Carlstrom, and Thomas Boden,“Synchronous Dataflow Architecture for Network Processors,” IEEE MICRO,(September-October 2004), the content of which is incorporated in itsentirety herein for its useful description and example architecture. Anadditional architecture is found in the disclosure of U.S. patentapplication Ser. No. 13/891,707 for “Hybrid Dataflow Processor,” filedMay 10, 2013, owned by the same assignee and incorporated herein byreference for its useful descriptions and example architecturesincluding descriptions of both hard and soft path packet processingoperations.

FIG. 4. is a flow diagram of an example method according to exampleembodiments. S400 occurs when some Processing Stage accepts a Packet.The Processing Stage will read the Execution Context of the Packet andobtain Instructions based on the Execution Context. At S401, theProcessing Stage will determine whether a Remote Resource is needed, inaccordance with the Instructions.

If at S401 a Remote Resource is not needed, the Processing Stage willdetermine what other Packet Processing Operation to be performed. AtS402, the thus-determined Packet Processing Operation is performed. AtS408, the Packet is passed along the Packet Processing Pipeline as aProcessed First Packet.

If at S401 the Instructions indicate that a Remote Resource is needed,at S403 the Processing Stage will check to see if a Resource Lock isavailable for the needed Remote Resource.

If at S403 the Resource Lock is unavailable, i.e., in a LOCKED status asindicated by the corresponding Respective Local Resource Lock, at S404the Processing Stage waits until the Resource Lock becomes available,and the Packet remains buffered in Buffer (40).

If at S403 the Resource Lock is available, i.e., in an UNLOCKED status,the Processing Stage at S405 locks the Local Resource Lock by settingthe status to LOCKED, and then carries out a Packet Processing Operationin accordance with the Instructions. According to an example embodiment,the information, that the Local Resource Lock has been set to the statusof LOCKED, travels as part of the execution context of the packet whichis passed from the Processing Stage, at S408.

More particularly, at S406, the Processing Stage accesses the allocatedRemote Resource, through an Engine, to Read a value from the RemoteResource. At S407, the value thus read is stored in the Packet'sExecution Context. At S408, the Processed First Packet is passed alongthe pipeline configuration to a subsequent Processing Stage.

FIG. 5. is a flow diagram of an example method according to exampleembodiments. The example method of FIG. 5 may be applied to multipleexample embodiments as discussed and as understood by one of ordinaryskill in the art. S500 occurs when a Processing Stage accepts a Packet.The Processing Stage reads the Execution Context of the Packet andobtains Instructions based on the Execution Context. At S501, theProcessing Stage determines whether a Resource is already Allocated inassociation with the received Packet.

At S501, if a Resource is not allocated, at S503 some Packet ProcessingOperation will be carried out in accordance with the Instructions. ThePacket will then be passed along the Packet Processing Pipeline as aProcessed Second Packet.

At S501, if a Resource is allocated, at S502 the Processing Stage willthen determine whether the Instructions indicate a WRITE Operation isneeded.

At S502, if a WRITE Operation is not needed, at S503 some other PacketProcessing Operation is carried out in accordance with the Instructions.

At S502, if a WRITE Operation is needed, at S504 the Processing Stagecarries out a WRITE operation with respect to the Allocated Resource.The Processing Stage then causes both S505 and S506 to occur.

At S505, the Processing Stage pass along the Processed Second Packet,S505. At S506, the Processing Stage generates and sends, along theResource Release Bus, a Resource Lock Release Request containingIdentifiers of the Respective Resource Manager and of the particularResource Lock to be released.

FIG. 6. is a flow diagram of an example method according to exampleembodiments. Each Processing Stage is configured to monitor the ResourceRelease Bus (300) for any Resource Lock Release Request. S601 representscontinuous monitoring of the Resource Release Bus (300) until a ResourceLock Release Request (RLRR) is received.

At S601, when a Processing Stage does receive a Resource Lock ReleaseRequest, the Processing Stage determines at S602 whether the ResourceLock Release Request is addressed to that Processing Stage.

At S602, when the Processing Stage determines that the Resource LockRelease Request does not indicate the address of that Processing Stage,the Resource Lock Release Request is ignored.

At S602, when a Processing Stage determines that the Resource LockRelease Request does indicate the address of that Processing Stage, atS603 the Resource Lock Release Request is implemented so as to releasewhichever Local Resource Lock is identified in the Resource Lock ReleaseRequest.

It is to be noted that, in an example embodiment, the use of the LocalResource Lock within the Processing Stage allows the treatment ofmultiple engines as a single resource. In such an example embodiment,multiple engines are locked by setting the status of a single LocalResource Lock to LOCKED.

FIG. 7 is a highly simplified schematic diagram of a device according toexample embodiments, in which the Resource Locks are centrally managedby a Central Resource Lock Engine (700-0). In FIG. 7, a PacketProcessing Device (1700) includes a Packet Bus (200), a Plurality ofPacket Streams (20), a Plurality of Engines (80), a plurality of RemoteResources (85-0, 85-1, . . . 85-n), a First Processing Stage (760-0), aSecond Processing Stage (760-1), a Third Processing Stage (760-2), aFourth Processing Stage (760-3), a Central Resource Lock Engine (700-0).The Central Resource Lock Engine (700-0) includes a Central ResourceLock Pool Local to Engine (780) which further includes an Engine LocalResource Lock (750-0, 750-1, . . . 750-n).

According to this example embodiment, the LOCK, READ, WRITE, and RELEASEOperations are each implemented by a separate Processing Stage. Inresponse to a request for a resource designated by a Packet (20-0), theFirst Processing Stage (760-0) determines if a Central Resource LockPool Local to Engine (780) contains an Engine Local Resource Lock(750-0, 750-1, . . . 750-n) capable of locking some requested Resource.If available, the First Processing Stage (760-0) LOCKs the Resource. TheFirst Processing Stage (760-0) passes Packet (20-0) along the Packet Bus(200) to the Second Processing Stage (760-1) as Packet (20-1) containinginformation that a Resource has been locked.

The Second Processing Stage (760-1) is configured to READ from thelocked Resource, through the Plurality of Engines (80), in response tothe Packet (20-1). The Second Processing Stage (760-1) is furtherconfigured to pass the Packet (20-1) along the Packet Bus (200) to theThird Processing Stage (760-2) as the Packet (20-2), which containsinformation from the READ operation.

The Third Processing Stage (760-2) is configured to WRITE to a Resource,through the Plurality of Engines (80), in response to the Packet (20-2).The Third Processing Stage (760-2) is further configured to pass thePacket (20-2) along the Packet Bus (200) to the Fourth Processing Stage(760-3) as the Packet (20-3), which contains information from the WRITEoperations.

The Fourth Processing Stage (760-3) is configured to generate a ResourceLock Release Request in response to the Packet (20-3). The CentralResource Lock Engine (700-0) is configured to receive the Resource LockRelease Request and subsequently release the Resource Lock from theCentral Resource Lock Pool Local to Engine (780). The allocated resourcehas been modified and is now available to be reallocated. The FourthProcessing Stage (760-3) then passes the Packet (20-3) along the PacketBus (200) as the Packet (20-4).

In this example embodiment, the Central Resource Lock Engine (700-0) canavoid contention for Remote Resources without employing a ResourceRelease Bus (300). On the other hand, since the Resource Locks are notlocal to the Plurality of Processing Stages (60), the Lock & ReadOperation (105) cannot be performed within a single Processing Stage.Likewise, the Write & Release Operation (106) also cannot be performedwithin single Processing Stage.

FIG. 8 is a highly simplified schematic diagram of a device according toexample embodiments. In this example embodiment, the Resource Locks arelocal to the Engines and not local to the Processing Stages. In FIG. 8,Packet Processing Device (1800) includes a Packet Bus (200), a Packet(20-0, -1, -2) of a Plurality of Parallel Packet Streams (20), a FirstProcessing Stage (860-0), a Second Processing Stage (860-1), RemoteResources (85-0, -1, . . . -n), and a Plurality of Engines (80)containing Resource Lock Engines (880-0, 1, . . . -n).

Each of the Resource Lock Engines (880-0, -1, -n) contains a RespectiveResource Lock Pool Local to each Engine (870-0, -1, . . . -n), and eachRespective Resource Lock Pool Local to each Engine (870-0, -1, . . . -n)further contains Respective Engine Local Resource Locks (850-0, 851-0, .. . 85 n-0; 850-1, 851-a, . . . 85 n-1; . . . ; and 850-f, a851-f, 85n-f) according to an example embodiment.

The First Processing Stage (860-0) is configured to perform a Lock &Read Operation (105) in response to a Packet (20-0). The Lock & ReadOperation (105) includes accessing a Resource Lock Engine (880-1) toaccess a Resource Lock Pool Local to Engine (870-1) to requestavailability of an Engine Local Resource Lock (850-1) corresponding to aRemote Resource (85-1), according to an example embodiment.

The First Processing Stage (860-0) is further configured to LOCK theRemote Resource (85-1) if the Engine Local Resource Locks (850-1) isavailable. The First Processing Stage (860-0) is further configured toREAD a value from the Remote Resource (85-1) and store information ofthat value into the Execution Context of the Packet (20-0). The FirstProcessing Stage (860-0) then passes the Packet (20-0) along the PacketBus (200) as a Packet (20-1).

The Second Processing Stage (860-1) is configured to accept the Packet(20-1) and subsequently perform a Write & Release Operation (106)associated with the Remote Resource (85-1) through the Resource LockEngine (880-1). The Second Processing Stage (860-1) is configured toWRITE to the Remote Resource (85-1) and subsequently send a ResourceLock Release Request to the Resource Lock Engine (880-1). The allocatedresource has been modified and is now available to be reallocated.

In the example embodiment of FIG. 8, the Lock & Read Operation (105) canbe performed within a single Processing Stage, as can the Write &Release Operation (106). The single Processing Stage, according to anexample embodiment, may perform a Lock & Read Operation (105) or a Write& Release Operation (106) within a single stage. The Resource ReleaseBus (300) is not required. On the other hand, since the Resource Locksare local to each of the Engines, the problem of contention is avoidedonly for operations seeking the same Remote Resource through the sameEngine.

A hybrid architecture implementing a soft path and a hard path will nowbe discussed in the context of FIGS. 2 and 3, according to exampleembodiments.

First, the concept of a “path” will be discussed. In FIG. 2, a path isto be understood as the collection of processing stage resources, andthe associated packet processing operations, through which a packetpasses during its time traversing the pipeline configuration. Forexample, in FIG. 2, the path through which packets labeled (P_0, P_0′,P_0″, . . . P_0 f) pass includes Respective Processor (30-0) and Buffer(40-0) of Processing Stage (60-0); Respective Processor (30-1) andBuffer (40-1) of Processing Stage (60-1); and so on until the path isconcluded with Respective Processor (30-n) and Buffer (40-f) ofProcessing Stage (60-f).

As shown in FIG. 3, the Respective Processor (30-0, 1, . . . n) isimplemented in an example embodiment as a plurality of individualprocessors, each of which is part of the path through which packetslabeled (P_0, P_0′, P_0″, . . . P_0 f) pass. Similarly, another pathexists for the packets labeled (P_1, P_1′, P_1″, . . . P_1 f) throughRespective Processor (31-0) and Buffer (41-0) of Processing Stage (60-0)and on through Respective Processor (31-n) and Buffer (41-f) ofProcessing Stage (60-f). Still other paths are apparent in FIG. 2. FIGS.2 and 3, according to an example embodiment, thus illustrate a packetprocessing device through which multiple paths exist. Multiple streamsof packets are handled at the same time along the multiple paths, i.e.,are processed in parallel.

Having explained in more detail the concept of a path, and havingexplained that an example embodiment provides for multiple paths thatprocess respective packet streams in parallel, some differences betweensoft and hard paths will now be explained. One difference relates tolatency, and the other relates to delivery. Hard paths provide fixedlatency, and soft paths do not provide fixed latency. A hard paththerefore completes the processing of every hard path packet within acertain, well-defined time period, if at all. A soft path does notachieve the processing of soft path packets in any particular timeperiod, and the soft path packet processing is typically implemented ona best effort basis so as not to impede the processing of hard pathpackets. On the other hand, soft paths guarantee the processing of softpath packets will complete (i.e., guaranteed delivery), while hard pathssometimes drop hard path packets.

In a hybrid architecture, a packet processing device includes hard pathsand one or more soft paths.

In the hybrid architecture, according to an example embodiment, hardpath packets under certain circumstances are dropped and not fullyprocessed through the pipeline configuration. That is to say, if theprocessing for a hard path packet requires a Remote Resource (85-0, -1,-n), but the corresponding Local Resource Lock (50) has a LOCKED state,the hard path packet is dropped. Dropping the hard path packet undersuch circumstances guarantees that the fixed processing latency is notsubverted failure to obtain a resource allocation.

In the hybrid architecture, according to an example embodiment, one ormore soft paths are configured to allow for a pause in soft pathprocessing in response to a soft path packet being unable to allocate aRemote Resource. That is to say, if the processing for a soft pathpacket requires a Remote Resource (85-0, -1, . . . , -n), but thecorresponding Local Resource Lock (50) has a LOCKED state, the soft pathpacket remains in Buffer (40) for that particular Processing Stage untilthe Local Resource Lock (50) has an UNLOCKED state.

According to an example embodiment, in a packet processing device havingthe hybrid architecture, the Local Resource Lock (50) acts as asemaphore indicating whether the processing for a soft path packetshould be paused. That is to say, when a Local Resource Lock (50)corresponding to a Remote Resource (85-0, -1, . . . , -n) required forthe subsequent processing of a soft path packet has a LOCKED state, theprocessing for the soft path packet is thus paused. A beneficial effectof such a pause is that it mitigates the possibility that the processingfor a soft path packet might adversely affect the processing of hardpath packets.

To put this another way, in an example embodiment, the use of LocalResource Lock (50) provides a mechanism that permits processing of softpath packets to be interleaved with the processing of hard path packetsin such a manner that the soft path packet processing is not carried outwhen remote resources are already allocated to hard path processing, orto other soft path processing.

According to example embodiments, an Interface Arbiter (120) directsdata packets through one of the hard paths, and directs control andmanagement packets along the one or more soft paths.

According to other example embodiments, a packet, for which a RemoteResource must be accessed as part of its processing, is directed along asoft path by Interface Arbiter (120). By directing such packetsrequiring access to a Remote Resource along the soft path, theinterruption of the processing of data packets along the hard path isavoided.

Although the inventive concept has been described above with respect tothe various embodiments, it is noted that there can be a variety ofpermutations and modifications of the described features by those whoare familiar with this field, without departing from the technical ideasand scope of the features, which shall be defined by the appendedclaims.

Further, while this specification contains many features, the featuresshould not be construed as limitations on the scope of the disclosure orthe appended claims. Certain features described in the context ofseparate embodiments can also be implemented in combination. Conversely,various features described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable sub-combination.

Although the drawings describe operations in a specific order and/orshow specific arrangements of components, one should not interpret thatsuch specific order and/or arrangements are limited, or that all theoperations performed and the components disclosed are needed to obtain adesired result. Accordingly, other implementations are within the scopeof the following claims.

What is claimed is:
 1. A packet processing device comprising: aplurality of processing stages, including a first processing stage and asecond processing stage arranged as a packet processing pipeline; thefirst processing stage and the second processing stage each comprising:a respective processor configured to process a packet of a packetstream, and a respective resource manager having a respective localresource lock corresponding to a remote resource; the respectiveprocessor being further configured to request the respective resourcemanager to allocate the remote resource; the respective resource managerbeing further configured to respond to the request to allocate theremote resource by locking the remote resource with the respective localresource lock and allocating the remote resource; the respectiveprocessor being further configured to implement a packet processingoperation associated with the allocated remote resource.
 2. The packetprocessing device according to claim 1, wherein the first processingstage and the second processing stage are further configured to performthe packet processing operation, in association with the allocatedremote resource, before passing the packet to a next one of theplurality of processing stages along the processing pipeline.
 3. Thepacket processing device according to claim 1, further comprising: aresource release bus connected to the plurality of processing stages;and the respective resource manager being further configured to: send toupstream ones of the plurality of processing stages a release requestmessage; and release the remote resource with the respective localresource lock in response to receiving the release request message. 4.The packet processing device according to claim 2, wherein therespective resource manager is further configured to: send a releaserequest message comprising identifiers for an identified resourcemanager and for an identified resource lock.
 5. The packet processingdevice according to claim 2, wherein the respective resource manager isfurther configured to: receive a release request message comprisingidentifiers for an identified resource manager and an identifiedresource lock.
 6. The packet processing device according to claim 1,wherein the respective resource manager is further configured to: detectwhen the remote resource cannot be locked; and pause implementation ofthe packet processing operation until the remote resource is locked. 7.The packet processing device according to claim 1, further comprisingthe plurality of processing stages being connected in series within asame integrated circuit.
 8. A packet processing device, comprising: aplurality of processing stages arranged as a packet processing pipeline;the processing stages each having processor cores and buffers; theprocessor cores and buffers of the processing stages defining aplurality of paths, for simultaneous packet processing, through thepacket processing pipeline; an ingress front end configured to directeach packet of an incoming stream of packets into one of the pluralityof paths; the plurality of paths including a hard path and a soft path,the hard path being configured to process received ones of the incomingstream of packets with fixed latency, the soft path being configured toprocess received ones of the incoming stream of packets with variablelatency; and the processing stages each further including a respectiveresource manager configured to request allocation of a remote resource,for a given packet of the incoming stream of packets in the soft path,in response to an instruction from one of the processor cores, wherebythe remote resource is allocated for processing of only the givenpacket.
 9. The packet processing device as set forth in claim 8, furthercomprising the respective resource manager requesting allocation of theremote resource by referring to a semaphore locally available in the oneof the plurality of processing stages in which the respective resourcemanager is included.
 10. The packet processing device as set forth inclaim 8, further comprising: a semaphore engine storing a plurality ofsemaphores, each corresponding to one of a plurality of remoteresources; and the respective resource manager requesting allocation ofone of the plurality of remote resources by referring to a semaphorecontrolled by the semaphore engine.
 11. The packet processing device asset forth in claim 8, further comprising: a plurality of enginesconfigured to interface each said respective resource manager with oneof a plurality of remote resources; and the respective resource managerrequesting allocation of one of the plurality of remote resources byreferring to a semaphore controlled by one of the plurality of engines.12. A packet processing method, for a processing stage of a pipelinepacket processing device, comprising: receiving, at a processor of afirst processing stage, a first packet and a request for allocation of aremote resource; responding, by the processor, to the allocationrequest, by setting a semaphore corresponding to the remote resource toindicate a locked status; implementing a first packet processingoperation, in association with the allocated remote resource, and inassociation with the first packet, to obtain a processed first packet;and outputting the processed first packet to a next processing stage ofthe pipeline packet processing device.
 13. The method of claim 12,further comprising: receiving a second packet and an indication of anallocation of a remote resource; implementing a second packet processingoperation, associated with the allocated remote resource, on the secondpacket, to obtain a second processed packet; requesting the release ofthe remote resource; and outputting the second processed packet to anext processing stage of the pipeline packet processing device.
 14. Themethod of claim 12, further comprising outputting a remote resourcerelease indication to an upstream processing stage of the pipelinepacket processing device in response to a completion of the secondpacket processing operation associated with the allocated remoteresource.
 15. The method of claim 12, further comprising obtaining theindication of the allocation of the remote resource from an executioncontext received together with the processed first packet.
 16. A packetprocessing device comprising: a plurality of processing stages, arrangedas a packet processing pipeline, configured to process packets in afirst direction along the packet processing pipeline; a lock releasemessage path connecting the plurality of processing stages, configuredto communicate along a second direction opposite to said firstdirection; each of the plurality of processing stages including: arespective local resource lock, and a respective processor core; and therespective processor core being configured to send, along the lockrelease message path, a resource lock release request messageidentifying one of the plurality of processing stages and identifying aresource to be released.
 17. The packet processing device as set forthin claim 16, further comprising a local lock buffer configured to buffera packet when the respective local resource lock indicates that acorresponding resource is not unlocked.
 18. The packet processing deviceas set forth in claim 16, wherein each of the plurality of processingstages further comprises a plurality of processor cores, the processorcores and buffers of the plurality of processing stages defining aplurality of paths, for simultaneous packet processing, through thepacket processing pipeline.
 19. The packet processing device as setforth in claim 18, further comprising an ingress front end configured todirect each of the packets into one of the plurality of paths.
 20. Thepacket processing device as set forth in claim 19, further comprisingthe plurality of paths being configured as a hard path and a soft path,the hard path being configured to process received ones of the packetswith fixed latency, and the soft path being configured to processreceived ones of the packets with variable latency, the ingress frontend directing a packet requiring a remote resource to only the softpath.